Role: Healthcare Data Scientist – Real World Data
Mode of employment: C2C
Region: NY / NJ (Hybrid)
Skills: Data Scientist with Real World Data (RWD) experience is a must along with Clinical Trail data, Large Language Models (LLM), Machine Learning, Deep Learning and NLP techniques. Hands on experience in OMOP CDM and OHDSI toolkits for at least 10 years.
Job Overview:
As a Senior Data Scientist, we seek your expertise in processing and understanding natural language data, along with your knowledge of Electronic Health Records (EHR) and laboratory report analysis, will be instrumental in driving our data science initiatives and innovations, particularly in the development of rich multimodal real-world datasets to expedite RWD-driven drug development in pharma domain.
Roles & Responsibilities
•
Employ and leverage NLP and open-source Large Language Models (LLM) such as LLAMA2, Mixtral, BERT, etc., to extract, process, and interpret unstructured medical data from diverse sources like EHRs, medical notes, and laboratory reports.
•
Collaborate with clinical scientists and data scientists to create efficient NLP models for healthcare, exhibiting an understanding of both the technical and medical aspects of the data.
•
Conduct data cleaning, preprocessing, and validation to maintain the accuracy and reliability of insights gathered from NLP processes.
What are we looking for?
•
Bachelor’s or Master’s degree in Computer Science, Artificial Intelligence, Machine Learning, or a related field.
•
10+ years of proven experience in NLP with a strong knowledge of NLP techniques such as Named Entity Recognition (NER), text summarization, topic modelling, etc. and their applied use in healthcare.
•
Strong foundation in machine learning concepts and techniques, including deep learning architectures, natural language processing, and text generation.
•
Proficiency in programming languages such as Python, TensorFlow, PyTorch, and related libraries for model development and deployment.
•
Experience working with the AWS cloud environment and large databases (e.g., AWS Redshift).
•
Experience in managing the ML lifecycle using open-source tools (e.g., MLflow).
•
Experience in RAG (Retrieval-Augmented Generation) and vector storage in the context of storing a large volume of healthcare unstructured documents and querying those.
•
Detail-oriented with strong analytical and problem-solving abilities.
•
Strong teamwork and communication skills to collaborate with cross-functional teams, including data scientists, engineers, and domain experts.
•
Ability to identify challenges in language model development and implement creative solutions to address them.
•
Enthusiasm for staying up to date with the latest advancements in AI, NLP, and large language models.